CRITAC - A Japanese Text Proofreading System

نویسندگان

  • Koichi Takeda
  • Tetsunosuke Fujisaki
  • Emiko Suzuki
چکیده

CRITAC (CRITiquing using ACcumulated knowledge) is an experimental expert system for proofreading Japanese text. It detects mistypes, Kana-to-Kanji misconversions, and stylistic errors. This system combines Prolog-coded heuristic knowledge with conventional Japanese text processing techniques which involve heavy computation and access to large language databases. 1. I n t r o d u c t i o n Current advances in Japanese text processing are mainly due to the remarkable growth of the word processor market. Machine readable Japanese text can now be easily prepared and distributed. This trend spurred the research and development of further text processing applications such as machine translation and text-to-speech conversion [SAKAS8310] [MIYAG8310]. However, some fundamental text processing procedures are missing for Japanese text. For example, counting the number of words in text is a difficult task since words are not separated by blanks. Our experimental system CRITAC (CRITiquing using ACcumulated knowledge) tries to overcome Japanese language problems. Proofreading (or critiquing, to some extent) [CHER80] [HEIDJ82] has been chosen as our research domain because it involves many text processing techniques and is one of the most important functions currently required and lacking. In this paper, we introduce CRITAC concepts and facilities including a conceptual representation of Japanese text called "structured text" to handle meaningful objects (e.g., sentences and words) and proofreading using heuristic rules for the structured text. The structured text consists of a set of Prolog [CLOCMS1] facts and predicates, each of which represents an object or a class of objects in the text. Because of this high-level representation, human proofreading knowledge can be easily mapped into Prolog rules. Two user-friendly representations of text, called "source" and "KWlC" (Key-Word-In-Context) views, are derived from the structured text. CRITAC provides users with editing and proofreading functions defined over these views. The notion of structured text, we believe, is not restricted only to the Japanese language. Discussions on our approach for languages other than Japanese will be given in the Conclusion. 2. CRITAC System Overview In this section we discuss the outline of CRITAC and its underlying concepts. As shown in Figure 1, the heart of CRITAC lies in its architecture. It consists of three major components: a user interface, a preprocessor, and a proofreading knowledge base. Word Processor F Text Compiler Machine Readable] SQL/DS Dictionary Server Text [ ['II FText Editor file~J transfer / ~ ~ PRIPROCESS;R / I View [View _J • I USER INTERFACE segmentaUon I [ word recognition [ I / ~ I PROOFREADING ~,k I KNOWLEDGE BASE stored %~. I ' "--I "~"-~[ Structured Text] Figure 1. CRITAC Configuration: The preprocessor generates the structured text from given text. The proofreading knowledge base currently consists of about 30 Prolog proofreading rides for the structured text. The user interface handles two external views and facilitates the SQL/DS online dictionary server and text compiler.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

CRITAC - An Experimental System for Japanese Text Proofreading

This paper describes an experimental expert system for proofreading Japanese text. The system is called CRITAC (CRITiquing using Accumulated knowledge). It can detect typographical errors, Kana-to-Kanji conversion errors, and stylistic errors in Japanese text. We describe the basic concepts and features of CRITAC, including preprocessing of text, a high-level text model, Prolog-coded heuristic ...

متن کامل

An Example-Based Japanese Proofreading System for Offshore Development

Yuchang CHENG Tomoki NAGASE Speech & Language Technologies Laboratory of Media Processing Systems Laboratories FUJITSU LABORATORIES LTD. 4-1-1 Kamikodanaka, Nakahara-ku, Kawasaki, Kanagawa 211-8588, Japan [email protected], [email protected] ABSTRACT More than 70% of Japanese IT companies are engaged in the offshore development of their products in China. However, a decrea...

متن کامل

Automated Rule Acquisition and Application to Japanese - Chinese Machine Translation YAMAMOTO

Automated proofreading, or the rewriting of generated outputs is discussed in this paper. We propose a new method of proofreading, which consists of an automatic rule acquisition module and its application module. Proofreading rules are described based on n-grams. In rule acquisition module, provisional rules are collected and then ltered out by \timid" policy. We utilize four kinds of screenin...

متن کامل

Proofreading in Young and Older Adults: The Effect of Error Category and Comprehension Difficulty

Proofreading text relies on stored knowledge, language processing, and attentional resources. Age differentially affects these constituent abilities: while older adults maintain word knowledge and most aspects of language comprehension, language production and attention capacity are impaired with age. Research with young adults demonstrates that proofreading is more attentionally-demanding for ...

متن کامل

A semantic proofreading tool for all languages based on a text repository

A method for finding lexical, syntactic or semantic errors in text in any language is introduced. It is based on a large text repository. The idea is to “follow everybody else”, that is, to compare the sentence offered by the user to the similar sentences in the text repository and suggesting alternative words when appropriate. This concept offers the possibility of taking proofreading further ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1986